perm filename PAPER[1,ALS] blob sn#531130 filedate 1980-08-22 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Some Experiments in Machine Learning Using the Game of Checkers:. III -
C00013 ENDMK
CāŠ—;
Some Experiments in Machine Learning Using the Game of Checkers:. III -
An Epilogue

Introduction

Some ten years has elapsed since active work has been done on the checker
program described in the two earlier papers with this same title.  It has
recently become apparent that this earlier work was flawed by two basic
assumptions that now appear to be untenable. It therefore seems appropiate
to review this earlier work and to try to rectify these earlier mistakes.
This present work is, however, in the nature of an epilogue since the
current and soon to be realized advances in the hardware design of
computers is sure to make obsolete much of the programming effort that now
has to be expended to overcome the speed and memory limitations of
existing computers.  What this paper hopes to do is to record the present
state of the art as a land mark against which future progress can be
measured.

We will briefly review the earlier work.  The reader who does not find
this review adequate might do well to refresh his memory by referring to
these earlier papers.

Two machine learning procedures were described in some detail: (1) a rote
learning procedure in which a record was kept of the board situationd
encountered in actual play together with information as to the results of
the machine analysis ofthe situation; this information could be referenced
at terminal board situations of each newly initiated tree search and thus,
in effect, allow the machine to look ahead further than time would
otherwise permit and (2) a generalization learning procedure in which the
program continuously re-evaluated certain learned parameters that were
used to evaluate the board positions at the treminal board positions of a
look-ahead tree search.  The rote learning procedure was characterized by
a very slow but continuous learning rate.  It was most effective in the
opening and end-game phases of the play. The generaliization learning
procedure, by way of contrast, learned at a more rapid rate but soon
approached a plateau set by certain limitations which only slowly became
apparent. It was surprising good at mid-game play but fared badly in the
opening and end-game phases.  While the generalization-learning procedure
could be applied to actual games against human opponents, it soon became
evident that the learning procedure could be greatly speeded up by using book
learning in which it was always assumed that the play, as outlined in the book,
was along the optimum path and the learning coefficients were adjusted
incrementally so as to bias the computer play in the direction of the book game.


In the earlier work, on generalization learning as reported in the first
paper, the learned parameters were the coefficients for a Linear
Polynomial.  In subsequent work, reported in the second paper, a Signature
Table scheme was used.  At the time of publication of the second paper, it
was thought that the Signsture Table scheme had remedied the major defects
of the generalization learning scheme and that with a longer learning
period this scheme would be vastly superior to the linear polynomial
scheme.  These hopes were never realized.  After an initial period in
which this schemed seemed to show real promise, it developed that even
non-master players were able to discover certain basic limitations of the
program and they were able to lead the machine down avenues of play where
it did very badly indeed.  At the time, it was not immediately obvious how
these limitations could be overcome, This, coupled with the pressure of
other work, lead to the abandonment of continuing work. It has been only
recently that methods of overcoming these limitations have either occurred
to the writer or have been suggested to him by others.

Many of the aspects of the actual playing techniques as described in (2),
as contrasted with board evaluation techniques, need little or no
revision.  Specific techniques such as Alph-Beta Pruning, Plausibility
Analysis, Multiple-Path Enhanced-Plausibility and Forward Pruning, are
still useful although the degree of dependence on them can be greatly
lessened by improvements in board evaluations.  Lacking a good evaluation
procedure, one must search the game tree along many paths and to great
depths.

What is in question is the learning techniques that have been used up to
now.  Three aspects of learning must be considered, 1)what is the nature
of the source material that the program given, 2) How this source material
is processed and stored and 3) how this learned material is used in
evaluating new situations. We will discuss these three aspects in some
detail.  People learn in a variety of ways, 1) from their own experience,
making experiments and observing the results, 2) by watching experts and
observing what they do (this watching may be directly or by reading about
the actions of the experts) a,d finally bying being told by the experts
something about the thought processes that they used in governing their
behavior.  So far, this last form of learning has been essentially absent
from my experiments. This now seems to be a grevious omission and steps to
be described in this paper report on oone way to rectify this deficiency.

In the earlier reported learning experiments, the program was given the
the basic rules of the game and some limited expert knowledge of the
factors that the expert player used in his evaluation of board situations
as filtered through this programmer and transmitted to the programm by
being written into the program itself.  The program was then exposed to
either direct play against human players or presented with stored
information in the form of book games which report, basically, how the
better human players react to the specific board situations that present
themselves during play.  At no time was the program privy to the thought
processes that were used by these expert players in making their decisions
or of the relative weights that the experts assigned to the various board
properties that the program was able to evaluate. The program was left
with the task of infering the proper weights to be assigned to these
various properties by a hill climbing technique in which it altered
coefficients (or numbers stored in siginature tables).  In any specific
instance the program was forced to credit or debit all of the various
parameters more or less indiscrimately but in such a way that on the
average the weights were gradually change in such a direction as to cause
the program to play more and more like the experts.  That the program was
able to improve its play in this way is more a triumph of brute speed than
an indication of any essential merit of this procedure.  People do learn in this
way but fortunately there are much more effective ways of learning.